# TBT Signal Model for Improved Accuracy of Highlevel Dynamic Power Estimation Procedure

Bojan Jovanović, Ružica Jevtić, and Carlos Carreras

*Abstract* - When estimating the dynamic power consumption of DSP datapaths, it is crucial to accurately calculate switching activity produced inside the design. For accurate switching activity calculation the existence of an appropriate data signal model is essential. This paper presents a triple-bit type (TBT) signal model which is used to represent bit-level switching activity at the output of multipliers. The model depends on wordlevel signal statistics and the number of multiplied input signals. For the sake of comparison with the standard dual-bit type (DBT) signal model, both models (TBT and DBT) are applied to the high-level power estimation of three reference designs implemented in FPGA. Calculated with respect to the measured power, the relative errors of here presented TBT model are four to five times smaller than the errors of the DBT model.

Keywords - Power estimation, TBT signal model, FPGA.

### I. INTRODUCTION

Due to increased density of ICs, when the number of transistors per unit area reached a critical point, heat dissipation and consequently power consumption became another parameter (beside speed and area) that VLSI designers must be aware of. If it is not properly optimized during the design phase, power consumption could cause heat demanding increasingly excessive expensive packaging and cooling strategies which might, either add significant cost to the system, or provide a limit on the amount of functionality. In the process of power optimization it is extremely important to have the tools for fast and accurate estimation of power consumption. With such tools, expensive and time consuming iterative physical implementations of the system could be avoided. Furthermore, applying power estimation techniques we could explore a large number of different system architectures to find (after a few iterations) the one with the lowest power consumption. Having in mind that higher levels of design abstraction have the largest power reduction opportunities as well as the shortest power analysis iteration times (between seconds and minutes) [1], we present the TBT signal model which is used for the calculation of bit-level switching activity and applied on a high-level power estimation procedure. The advantages of this signal model were for the fist time briefly reported in

Bojan Jovanović is with the Department of Electronics, Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, 18000 Niš, Serbia, E-mail:bojan@elfak.ni.ac.rs.

Ruzica Jevtić and Carlos Carreras are with the Dept. of Electronics Engineering, ETSIT, Technical Univ. of Madrid, 28040 Madrid, Spain, E-mail:{ruzica,carreras}@die.upm.es. [2].

The paper is organized as follows. In Section II some power estimation approaches are discussed. The TBT signal model is introduced in Section III, while experimental results are reported in Section IV. Section V summarizes the conclusions.

## II. POWER ESTIMATION: STATE OF THE ART

The first EDA software packages for the automation of the IC design process were equipped with various tools intended for the simulation (prediction of the IC behaviour) and analysis of circuit performance: speed, occupied area, detectability of faults etc. As power consumption has become a more and more important issue, many EDA packages are now including tools for its estimation.

The models for power estimation differ in the nature of the power they are trying to estimate (static, dynamic, short-circuit or total power consumption), as well as in the level of abstraction of the target designs. The higher the level of abstraction, the faster the estimations. However, short estimation times at higher leves usually imply less accuracy of the estimates. Roughly, there are two different approaches to address the problem of power estimation: statistical and probabilistic [3].

The statistical approaches simulate the circuit with input vectors and collect statistical data for each node in the circuit. The simplest statistical techniques for power estimation are presented in [4, 5, 6, 7]. They are accurate but memory and time consuming (especially for large circuits) as well as pattern-dependent. In order to cope with the pattern dependence problem, some statistical approaches based on Monte Carlo simulation are presented in [8, 9, 10]. Under the assumption that the power consumed by the circuit over a long period T has a normal distribution, the technique applies randomly generated input patterns to the circuit primary inputs and monitorizes the power dissipation per time interval T.

On the other hand, probabilistic methods analyze the circuit and generate the expressions for the signal probabilities propagated through the circuit [3, 11, 12, 13]. Hence, they do not depend on the number of input data vectors but only on their statistics. These methods also have problems when analyzing large circuits, as the complexity of the analytical expressions depends on the number of inputs and the logic depth of a circuit.

Some unconventional approaches to estimate power consumption can be found in [14]. For a low-level

estimation technique the author uses an in-house program language (AleC++) and a simulator (Alecsis) to extract the total switching capacitances of the circuit. For each logic element inside the design it is necessary to have a low-level model with the informations about its capacitances. The low level nature of the model makes it slow for the analysis of large designs. Another approach, also presented in [14], is based on the integration of the supply current waveform. Since it is the most accurate technique for total power consumption estimation, the author proposed using a threelayer neural network to model the area of the supply current impulse.

Power estimation techniques for static and short-circuit power consumption are described in [15] and [16], respectively.

In the FPGA arena, existing power estimation techniques aim to represent power consumption in the form of an equation. Variable parameters in the equation depend on the various factors (input and output signal statistics, operand word-lengths, circuit fanout, component structure etc.). Some approaches for FPGA power estimation are presented in [17, 18, 19, 20, 21]. The reported power estimation errors are in the range between 10% and over 30%. While some of them are not compared with the real measured power values [17], the other are extremely time consuming (up to the 12 hours) [18, 21] or require long calibration procedures [19, 20].

Finally, there are a few tools designed for commercial FPGAs. The most widely used are XPower from Xilinx [22] and PowerPlay from Altera [23]. These tools provide a detailed power breakdown of a design based on the resource capacitance and utilization as well as data switching activity. In their early versions the tools had limited accuracy. Large errors were detected when the estimates were compared to physical measurements [19]. Later versions are becoming more and more sophisticated and accurate. Additional problems are encountered when complex designs with many signals are to be modelled, as these tools require large amounts of memory and long execution times.

In the rest of this paper we will focus on the high-level dynamic power estimation based on a probabilistic approach.

### III. TBT VS DBT SIGNAL MODEL

For the estimation of dynamic power consumption we use the general approach described in [24] as well as widely known expression for CMOS gate dynamic power:

$$P = Vdd^2 \cdot f \cdot C_1 \cdot SW = a \cdot SW \tag{1}$$

where SW is the total switching activity produced inside the the design and constant *a* represents the product of three power terms: squared power supply (known for a specific FPGA architecture), clock frequency (fixed for a specific design), and load capacitance *Cl* which is, assumed to be constant due to regular FPGA structure as in [17]. The constant a is obtained empirically in the process of calibration, through a small number of low-level power measurements. The switching activity is computed analytically as it will be explained below.

The switching activity is determined by the present and immediately-past value of a signal. If they are different the switching activity has occurred. In order to calculate the total switching activity of a design we need to start from its inputs and determine the switching activity of its input bits. For this purpose, the dual-bit type (DBT) model is presented in [25]. Under the assumption that DSP input signals are stationary and with a Gaussian distribution, the DBT model calculates bit-level switching activities as functions of the input bit-widths and signal statistics: autocorrelation, variance and mean value. In Fig. 1 we have plotted the bit switching activity in a Gaussian signal word versus the bit position in the word for different autocorrelations. All the signals have a zero mean and the same variance.

There are three switching activity regions that can be clearly distinguished: the LSB region with a fixed switching activity of 0.5, the MSB region with strongly correlated data bits, and the so-called linear region that lies between the two previously mentioned ones.



Fig. 1. Bit switching activity vs. bit position in an input word

The breakpoints that divide the regions can be obtained as:

$$BP0 = \left[ \log_2(\sqrt{1 - \rho^2} \cdot \sigma) \right]$$

$$BP1 = \left[ \log_2(6 \cdot \sigma) \right]$$
(2)

The switching activity of the MSB bits  $(sw_{MSB})$  is calculated by knowing its dependency on the probability of the MSB bit being '1'  $(p_{MSB})$ , as introduced in [26]:

$$sw_{MSB} = 2 \cdot p_{MSB} \cdot (1 - p_{MSB}) \cdot (1 - \rho) \tag{3}$$

Once the bit-level input switching activities are known, the switching activity generated inside the component can be easily obtained. For this purpose, the probability method presented in [24] is used. The approach takes the input bit switching activities and computes the switching parameters of the output and carry bits of the design's components through probabilistic formulas obtained from truth tables of the component's basic cells. Multiplying the estimated switching activity (obtained as the sum of switching activities of all nodes inside the design) by the previously determined constant *a* we obtain the estimated value for the design's dynamic power consumption.

The DBT signal model, however, has proven to be inefficient in modelling the bit-level switching activity at the output of some non-linear DSP designs. The binary multiplier is the typical example of such a design. It has been noted that the output of the multiplier has a distribution that is symmetrical around the mean value but it is not a Gaussian one [27]. The LSB bit of the product exhibits less switching activity than 0.5 because only the product of odd numbers is odd. This is confirmed in Fig. 2 where the bit-level switching activity at the multiplier output is plotted. A new LSB1 signal region containing the LSB bits is clearly noticeable. This region tends to grow as the number of chained multiplications grows. The number of bits affected by the multiplication (breakpoint BPm) is equal to 2 x nm, where nm is the number of multiplied Gaussian processes.



Fig. 2. Bit-level switching activity at the multiplier output

As the multiplier is a common data-path operator in the hardware implementation of many modern DSP designs (exponential, logarithmic, square root, reciprocal functions, FIR and IIR filters, FFTs etc.), power estimates of these designs would be inaccurate if the DBT signal model is used. Consequently, in this paper we present a new TBT signal model which takes into account the LSB1 signal region approximating its exponential-like dependence with the following equation:

$$sw(i) = 0.5 - (0.5 - sw(0)) \cdot e^{-(0.25 + 2^{(-nm+1.25)}) \cdot i}$$
 (4)

where sw(0) is the switching activity of the LSB bit, which is obtained according to the formulas given in [27], *i* is the bit position, and *nm* is the number of Gaussian processes that have been multiplied up to this point. The switching activity of the rest of the bits, as well as the breakpoints BP0 and BP1 are obtained according to the DBT method. Fig. 3 shows that the proposed approximations match well with the actual switching activities.



Fig. 3. Actual (BLT) vs. estimated (TBT) switching activities in the LSB1 region

For the evaluation of the TBT signal model, the switching activities produced inside four reference designs with different number of multipliers have been measured and compared with the switching activities obtained when applying TBT and DBT signal models. The results are reported in Table I.

 
 TABLE I

 Relavite errors between the sum of measured and estimated switching activities

| nm        | 3    | 4    | 5     | 6     |
|-----------|------|------|-------|-------|
| ErrDBT[%] | 4.20 | 6.29 | 6.78  | 10.48 |
| ErrTBT[%] | 0.87 | 1.07 | -0.17 | 2.08  |

It is obvious that a greater number of the multipliers in the design increases the relative error of the DBT signal model making it more inefficient.

The impact of the TBT signal model on power estimation will be the subject of the next section.

## **IV. EXPERIMENTAL RESULTS**

Several DSP designs have been used in the experimental set. On the one hand, each DSP design is implemented in a Virtex-II Pro XC2VP30 FPGA chip and the design power is measured as described in [28]. In brief, the on-board power measurement system consists of two boards: one with a Xilinx Virtex-II Pro FPGA device, used for measuring the power, and another with an Altera Strarix FPGA device used for loading the input vectors to the first one. In this way, designs implemented in the Virtex-II device are stimulated externally so there is no additional power caused by vector generation that can influence the measured power value. As a result, the measured power corresponds to the static power plus the dynamic power of logic, interconnections and clock. To extract the measured dynamic power consumption value we repeat the following procedure. First, we measure the static power when no input stimuli nor clock are applied. Then, we measure the clock power together with the static power (all zeroes are applied to the inputs). Finally, we measure the power when various inputs with Gaussian distributions are applied. When we subtract the clock and static power from the total power the result is the power of logic and signals. From this power value we subtract the power of global connections using an in-house C++ program (MARWEL) [29]. This program extracts the lengths of the interconnects from the Xilinx design files and allows for the computation of the power consumption of global connections. The result is the measured dynamic power consumption of the design.

On the other hand, for the same DSP design, we apply the model for dynamic power estimation described in [24]. Switching activities produced inside the design (see Eq. 1) are calculated using the TBT and DBT signal models as well as using the actual bit-level switching activities of the component inputs (BLT). Measured and estimated dynamic power values are then compared to obtain the relative error of the applied model for power estimation. The estimated power obtained from XPower tool (ISE 10.1) is also included in the comparison.

The evaluation set consists of three different DSP designs. The first two are relatively small and correspond to the implementation of the following logical functions:

$$DSP_{1} = (x_{2} \cdot x_{3}) \cdot x_{2} + (x_{1} + x_{3}) \cdot x_{2}$$
  
$$DSP_{2} = ((x_{1} + x_{2}) \cdot (x_{3} + x_{4}) + x_{1} \cdot x_{2}) \cdot x_{2} \cdot (x_{3} + x_{4})$$
  
(3)

The DSP<sub>3</sub> design is quite larger and has a structure that reminds one of a 16-tap digital FIR filter implemented as a cascade realisation of eight second-order sections like the one presented in Fig. 4. All three DSP designs are synchronous. The clock frequency for the first two designs is 50MHz. Keeping in mind the complexity of the DSP<sub>3</sub> design, the clock frequency for it is set to only 16MHz in order to keep the static power constant. Table II shows the results for each design when data with different autocorrelation coefficients,  $\rho$ , are applied to its inputs.



Fig. 4. Second-order section of DSP<sub>3</sub> design

The first two columns show the number of occupied slices for each DSP design as well as the number of embedded multipliers used in the design. The computation times for each DSP design are listed in the next column, followed by the autocorrelation coefficients and the relative errors obtained for each model.

TABLE II Relative power estimation errors for three signal models (BLT, TBT, DBT) and XPower (XPw)

| Bench-<br>mark      | Slices | Emb.<br>mult. | Comp.<br>time[s] | ρ      | Er(BLT)<br>[%] | Er(TBT)<br>[%] | Er(DBT)<br>[%] | Er(XPw)<br>[%] |
|---------------------|--------|---------------|------------------|--------|----------------|----------------|----------------|----------------|
| DSP <sub>1</sub>    |        | 2             | 0.92             | 0      | 10.3           | 7.6            | 17.48          | 328.79         |
|                     | 212    |               |                  | 0.9    | 6.94           | 1.49           | 6.62           | 316.48         |
|                     | 212    |               |                  | 0.99   | 9.33           | 7.6            | 9.59           | 281.70         |
|                     |        |               |                  | 0.9995 | 9.49           | 8.71           | 13.68          | 246.91         |
| DSP <sub>2</sub> 19 |        |               | 1.1              | 0      | 7.92           | 4.5            | 11.58          | 258.45         |
|                     | 102    | 2             |                  | 0.9    | 7.51           | 1.03           | 7.6            | 233.50         |
|                     | 192    | 2             |                  | 0.99   | 12.18          | 7.81           | 11.54          | 216.23         |
|                     |        |               |                  | 0.9995 | 22.24          | 21.24          | 30.99          | 245.27         |
| DSP <sub>3</sub>    | 2977   | 8             | 91.95            | 0      | -0.38          | 9.91           | 38.3           | 455.08         |
|                     |        |               |                  | 0.9    | -1.55          | 8.99           | 37.38          | 455.06         |
|                     |        |               |                  | 0.99   | -1.56          | 10.05          | 38.24          | 437.46         |
|                     |        |               |                  | 0.9995 | -0.78          | 15.25          | 41.27          | 442.62         |

The greater complexity of the DSP<sub>3</sub> design is confirmed by the number of occupied slices as well as by the computation time needed for its power estimation. Considering relative errors, we can conclude that the TBT model gives far better power estimations (four to five times) than the DBT model, for all DSP designs and for all autocorrelation coefficients. Mean relative errors in power estimation of all three DSP designs for the TBT and DBT models are equal to 8.68% and 22.05%, respectively. Furthermore, the TBT model achieves the biggest improvements with respect to the DBT model in the case of the DSP<sub>3</sub> design. This can be explained by the fact that designs  $DSP_1$  and  $DSP_2$  consist of several adders and multipliers and have just a few bits in the LSB1 zone, so the effect of using TBT instead of DBT is barely noticeable. The number of adders and multipliers in the DSP<sub>3</sub> design is greater, which contributes to the increase in the number of LSB1 bits, so the impact of using a more accurate signal model is more obvious. When analyzing the XPower tool relative errors we can confirm the claims reported in [19] about the large estimation errors of such a tool for small designs in comparison with physical power consumption measurements.

# V. CONCLUSION

We have presented the TBT signal model intended for the bit-level switching activity calculation as well as for the integration in high-level probabilistic approaches in dynamic power estimation. Unlike the previously used DBT signal model, it takes into account non-linearities produced at the output of some DSP circuits and introduces a new switching activity region for the LSB bits. The proposed model is not pattern-dependent. It depends only on input signal statistics and bit-widths as well as on the number of prior multiplications inside the design. The validity of the TBT signal model has been confirmed through on-board dynamic power consumption measurements. Furthermore, in comparison with the DBT model, relative errors of the estimations are quite lower (four to five times), especially when estimating larger designs with more non-linear DSP circuits.

#### ACKNOWLEDGEMENT

This work was supported in part by the Serbian Ministry of Science and Technological Development uder project TR-33051 as well as by the Spanish Ministry of Science and Innovation under project TEC2009-14219-C03-02.

## REFERENCES

- Raghunathan, A., Jha, N., Dey, S., "High-level Power Analysis and Optimization", Kluwer Academic Publishers, Massachussets, 1998.
- [2] Jovanovic, B., Jevtic, R., Carreras, C., "Triple-bit method for power estimation of nonlinear digital circuits in FPGAs", Electronics Letters, Vol. 46, No. 13, June 2010, pp. 903-905.
- [3] Machado, F., "Switching Activity Analysis of Digital Electronic Circuits described at RTL using Probabilistic Techiques. Proposal of an Estimation Method", PhD Thesis, Technical University of Madrid, 2008.
- [4] Deng, C., "Power analysis of CMOS/BiCMOS circuits" In. Proc. of the Int. Workshop on Low Power Design, Apr. 1994, pp. 3-8.
- [5] Landman, P., "Low-Power Architectural Design Methodologies", PhD Thesis, Electronic Research Laboratory, Univ. of California, Berkley, Aug. 1994.
- [6] Schneider, P., "PAPSAS: A Fast Switching Activity Simulator", PATMOS'95, pp. 351-360.
- [7] George, B., "Power Analysis for Semi-Custom Design", CICC, New York 1994, pp. 249-252.
- [8] Burch, R., Najm, F., Yang, P., Trick, T., "A Monte Carlo approach for power estimation", IEEE Trans. on VLSI Systems, No. 1, Vol. 1, Mar. 1993, pp. 63-71.
- [9] Todorovich, E. et al., "A Tool for Activity Estimation in FPGAs", LNCS, June 2002, pp. 340-349.
- [10] Todorovich, E., Boemo, E., "Statistical Power Estimation for FPGAs", LNCS, June 2005, pp. 515-518
- [11] Cirit, M., "Estimating Dynamic Power Consumption of CMOS Circuits", Proc. ICCAD, Nov. 1987, pp.534-537
- [12] Chou, T., Roy, K., Prasad, S., "Estimation of Circuit Activity Considering Signal Correlations and Simultaneous Switching", Proc. of the IEEE/ACM Int. Conf. on CAD, June 1994, pp. 300-303.
- [13] Machado, F., Riesgo, T., Torroja, Y., "Disjoint Region Partitioning for Probabilistic Switching Activity

*Estimation at Register Transfer Level*", PATMOS, Sep. 2008, pp. 1145-1148.

- [14] Maksimovic, D., "Logical Simulation An estimation of limit properties of designed digital circuit", PhD Thesis, University of Nis, June 2000.
- [15] Hussam, H., Dhamin, K., Come, R., "Static Power Estimation of CMOS Logic Blocks in a Library Free Design Environment", Int. Journ. of Design, Analysis and Tools for Circuits and Systems, Vol. 1, No. 1, June 2011, pp. 41-52
- [16] Nose, K., Sakurai, T., "Analysis and Future Trends on Short-Circuit Power", IEEE Trans. on CAD of ICs and Systems, Vol. 19, No. 9, Sep, 2000, pp. 1023-1030.
- [17] Chen, D., Cong, J., Fan, Y., "Low-Power High-Level Synthesis for FPGA Architectures", In Proc. of ISLPED, Aug. 2003, pp. 134-139.
- [18] Choi, S., Jang, J., Mohanty, S., Prasanna, V., "Domain-Specific Modeling for Rapid Energy Estimation of Reconfigurable Architectures", The Journ. of Supercomputing, Vol. 26, No. 3, Nov. 2003, pp. 259-281.
- [19] Elleouet, D., Savary, Y., Julien, N., "An FPGA Power Aware Design Flow", PATMOS, Sep.2006, pp. 415-424
- [20] Abdelli, N., Fouilliart, A., Julien, N., Senn, E., "High-Level Power Estimation on FPGA", IEEE Symp. on Industrial Electronics, June 2007, pp. 925-930.
- [21] Anderson, J., Najm, F., "Power Estimation Techniques for FPGAs", IEEE Trans. on VLSI Systems, Vol. 10, No. 12, Oct. 2004, pp. 1015-1027.
- [22]ftp://ftp.xilinx.com/pub/documentation/tutorials/xpowe rfpgatutorial.pdf
- [23]http://www.altera.com/literature/hb/qts/qts\_qii53013.p df
- [24] Jevtic, R., Carreras, C., Caffarena, G., "Fast and Accurate Power Estimation of FPGA DSP Components Based on High-level Switching Activity Models", Int. Journ. of Elec. Vol. 95, No. 7, July 2008, pp. 653-668.
- [25] Landman, P., Rabaey, J., "Architectural Power Analysis: The Dual Bit Type Method", IEEE Trans. on VLSI Systems, Vol. 3, No. 2, Mar. 19995, pp. 173-187.
- [26] Ramprasad, S., Shanbhag, N., Hajj, I., "Analytical Estimation of Signal Transition Activity from Wordlevel Statistics", IEEE Trans. on CAD of ICs and Systems, Vol. 16, No. 7, 1997, pp. 718-733.
- [27] Bitzaros, D., Nikolaidis, S., "Estimation of bit-level transition activity in data-path based on word-level statistics and conditional entropy", IEE Proc. Circuits Devices Syst., Vol. 149, 2002, pp. 234-240.
- [28] Jevtic, R., Carreras, C., "Power measurement methodology for FPGA devices", IEEE Trans. Instrum. Meas., Vol. 59, No. 9, June 2010, pp. 237-247.
- [29] Jevtic, R., "High-Level Power Estimation of DSP

*Circuits Implemented in FPGAs*", PhD Thesis, Technical Unversity of Madrid, 2009.